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® A document formatting apparatus automatically arrange input document data so as to match a pre-formatted 
document. Firstly, layout structure including character properties and logical properties of each text data item are 
extracted from a pre-formatted document Secondly, logical properties of each text data item are extracted from 
an input document. Thirdly each of the logical properties of the input document is compared with corresponding 
logical properties of the pre-formatted document. When the logical properties of input text data are matched with 
logical properties of the pre-formatted document^ corresponding character properties of the pre-formatted 
document are applied to the input text data. Therefore, each text data item of the input document is 
automatically arranged in accordance with the preset layout structure and corresponding character properties. 
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"DOCUMENT FORMATTING APPARATUS" 



The present invention relates to a document formatting apparatus for arranging a document according 
to a user's desired format. 

Recently. Desk Top Publishing Systems (DTP) has been used in offices. The DTP allows many 
different page sizes, text frame sizes, character properties as layout functions and prints high quality 
5 documents. 

When using the DTP. at the beginning of creating document, the user must designate many kinds of 
layout function. For example, as shown in Rgure 1, menu items are displayed in order and the user selects 
desired items one by one. In the left side of Figure 1. document type items are displayed first. When the 
user selects a thesis as his desired document type, text data items of a thesis are displayed as shown in 

10 the center of Rgure 1. The text data of a thesis are the title, author, place, chapter title, chapter paragraph 
section title, section paragraph and so on. When the user selects the title, character property items for the 
title are displayed as shown in right upper stde of Figure 1. These are character style, character font and 
character position, which are necessary for title to be created. The user then might select GOTHIC style as 
character style. 18 points as character font and centering as character position. Next, the text data items are 

75 displayed again as shown in center of Rgure 1. When the user selects author, character property items for 
author are displayed as shown in right lower side of Figure T Then user selects each desired item in the 
same manner. In this way. the user designates all character properties of all text data for a thesis. However 
because the user designates desired format by means of menu items only, he cannot view the formatted 
document image. Moreover if he is later displeased with the displayed formatted document, he must edit it 

20 At tMis time, a small editor window is displayed by the side of the formatted document on display, and the 
user amends the fonmatted document by using the window. However this amended format is not saved for 
use in the next document creation. In short, the amendment made by using the editor window is not 
memorized. Therefore the user's desired final format is not applied to the next input document In order to 
memonse this final format after amendment, menu items are displayed again and the user must designate 

25 Items like he did In making the amendment. This second designation of menu items is troublesome for the 
user. 

Accordingly, it is one object of the present invention to provide a document fomiatting apparatus which 
allows the user to easily understand the format which will be presented. 

It is another object of the present invention to provide a document formatting apparatus which does not 
30 require the user to designate layout function by menu items. 

It is another object of the present invention to provide a document formatting apparatus which 
memorises an amended fomnat and uses it for the next input document. 

Accordingly, the present invention provides a document formatting apparatus, comprising: 
memory means for storing a formatted document; 
05 extracUng means for extracting layout structure information and logical property information for the text 
data in a stored pro-formatted document by use of predetermined extraction mies; 

analysing means for extracting logical property information for the text data in 'an input document; and 
formatting means for detemiining whether the extracted logical property information from the stored 
formatted document coincides with the logical property information of the input document and for editing 
40 the text data of the input document in accordance with corresponding layout structure information of the 
stored formatted document, when the kjgical property information coincides. 

The invention also extends to a method of formatting a document comprising the steps of: 
memorising an already formatted document; 

extracting layout structure information and logical property information for the text data in the stored 
45 formatted document by use of predetermined extraction rules; 

analysing logical property information for the text data in an input document; 

determining if the extracted logical property Information from the stored formatted document matches 
the logical property information of the input document; and 

editing the text data of the input document, in accordance with corresponding layout structure 
so information of the stored formatted document when the logical property information matches. 

One embdomenl of the present invention will now be described by way of example with reference to the 
accompanying drawings in which: 

Figure 1 shows an example of display transition by menu selection according to the prior art 

Rgure 2 shows a schematic of a document formatting apparatus of the present invention. 

Figure 3 shows a situation of format application according to the present invention. 
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Rgure 4A and 4B show memorised data of formatted document and extended formatted data. 
Rgure 5 shows a flow chart of the process for extracting format from pre-formatted document accordinq 
to the present invention. ^ 

Rgure 6A. 6B and 6C show data structure of layout structure information and logical property 
information according to the present invention. 

Rgure 7A and 7B show input document and its logical property information. 
Rgure 8 shows formatted input document according to the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 



TO 



Figure 2 shows a schematic of a document fonmatling apparatus according to the present invention. 
This apparatus comprises on input section 1. text data analysis section 2. formatted document memory 
section 3. format information extract section 4, format infonnation memory section 5. format Information 
application section 6, output data generation section 7 and output section 8. 
75 The input section 1 is means for entering document data consisting of character code and various 
operation command. For example, the input section 1 is keyboard or tablet. 

The text data analysis section 2 is means for extracting a logical property of each text data input 
through the input section 1, and sends each text data and corresponding logical property information as 
output. (Text data is title, author name, place, chapter name, chapter paragraph, section name, section 
20 paragraph and so on. They compose the document) in short, the text data analysis section 2 analysis each 
text data by using document structure analysis rules. This analysis technique is disclosed in USP 4813010. 

The formatted document memory section 3 stores a pre-formatted document, which was generated by 
this apparatus and edited by by the last user. In other words, the stored document has format which was 
selected by the user the last time. This section 3 is, for example, a magnetic disk apparatus or an optical 
25 disk apparatus. 

The format information extract section 4 extracts format information, which are layout structure and 
corresponding logical property, from the pre-formatted document stored in the document memory section 3. 
In short, various layout structure (page size, text frame size, character property) and logical property of 
each text data are extracted from pre-fomatted document The extracted format information is stored in 
30 format information memory section 5. 

The format information application section 6 compares the logical property of the input document with 
the logical property of the pre-formatted document Then, for the text data of an input document whose 
- logical property is matched with the logical property of the-formatted document, this section 6 sends the 
text data and character property, which con^esponds to matched logical property of pre-formatted docu- 
35 ment, to the output data generation section 7, 

The output data generation section 7 receives corresponding page size and text frame size from the 
format information memory section 5. Then the section 7 sets page area and text frame area inside the 
page area in its memory. And each text data, which are sent by the section 6, are located and edited 
according to the character property inside the text frame area. 
40 The output section 8 is a CRT display apparatus for displaying the formatted document in proportion to 
the output data from the output data generation section 7. 

Figure 3 shows a situation of format application. By using Rgure 3. the present invention will be 
explained in detail as follows. In Rgure 3. document formatting apparatus 10 corresponds to aJI sections 
(from section 1 to section 8) in Rgure 2. Rrst. new document A enters the apparatus 10 through input 
45 section 1. As described atxDve, text data (each sentence) of the document A is extracted by line feed code. 
And the logical property of each text data is determined according to the document structure analysis rule. 
On the other hand, a pre-formatted document is stored In the document memory section 3, If many types of 
formatted document are stored in the section 3, the user selects his desired type of formatted document 
through the input section 1 . The selected formatted document M is sent to the format information extract 
50 section 4 in the apparatus 10. Rgure 4A shows a memorized data stmcture of a formatted document 
Each symbol's meaning in Figure 4A is as follows. 
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[Pi(PXi, Pyi)3 [page number i {X width, Y width)] 

[Ei-j(FXi-3, FYi-j) (XWi-j, YWi-o)} [text frame number j 

in page i (X starting point, Y starting point) (X width, 
Y width)) 

text contents in text frame Ei-(J-i) 

continues to text contents in next text 
frame Ei-j 



Ei-j 
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NULL 
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text contents in the text frame does not 
continue to text contents in next text 
frame 

logical property 

(A is character style, B is character 
font, C is character position or 
attribute) 
text contents 
line feed code 



According to th© above-mentioned rules 1-8, the format information extract section 4 extracts layout 
structure and logical property included in formatted document. Figure 4B shows formatted document 
extended on display according to the memorized data in Figure 4A. In short, the format information extract 
section 4 is able to extract layout structure and logical property from the memorized data itself without 
extending it. 

Rgure 5 shows a flow chart of the extract process used by the section 4. First, against the received 
formatted document (Step 81). the section 4 extracts first page size by using above-mentioned extract rule 
1 (Step S2). Then, extracted page size {PX1, PY1) with page number 1 are stored in the format information 
memory section 5 (Step S3). Secondly, against the page 1 extracted at step S2, the section 4 extracts first 
text frame size by using above-mentioned extract rule 2 (Step S4). And connection information (NULL) is 
extracted by using extract rules 3 and 4. Then extracted starting point (FX1-1, FTI-I). width (XW1-1. YWl- 
1) and connection information with text frame number El-1 are stored in the format information rriemory 
section 5 (Step S5). Third, against the text frame El-1 extracted at step 84, the section 4 finds first text 
data "DTP SYSTEM" by using extract rules 7 and 8. And the section 4 extracts logical property and 
character property by using extract rules 5 and 6 (Step S6). Then, extracted logical property (TITLE) and 
character property (GOTHIC, 18, CENTER) are stored in the format information memory section (Step 7). 
Next, against the text frame El-1, the section 4 finds second text data "ISAMU IWAI" by using extract rules 
7 and 8. And logical property (AUTHOR) and character property (GOTHIC. 14, CENTER) are extracted. This 
extract process (Step 6) is repeated for all text data in text frame Ei-1 (Step 8). Next, against the page i. 
second text frame size (FX1-2. FYl.2), (XW1-2. YW1-2) and connection information (E1-3) are extracted by 
using extract rule 2,3 and 4. And against the text frame El -2, logical property and character property for 
each text data are extracted in the same way. This extract process (Step 4 - Step 6) is repeated for all text 
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frame in the page 1 (Step 9). Last, against the received formatted document, second page size is extracted 
by using extract rule 1 . (Second page is not disclosed in Figure 4A and 4B.) Against the second page, all 
text frames are extracted in order. Then, all logical property and character property in each text frame are 
extracted in order. This extract process (Step 2 - Step 6). is repeated for all pages in the formatted 
document. 

Figure 6A, 6B and 6C show data structure of layout staicture and logical property stored in format 
information memory section 5. In Figure 6A. page size and text frame number corresponding to page 
number are disclosed. In Rgure 6B, text frame size (starting point, width, connection) and logical property 
corresponding to text frame number are disclosed. At this place, text frames E1-2. E1-3, E2-1 and E2-2 are 
connected. In short, text data, whose logical property is chapter or section, are located in above-mentioned 
4 text frames according to its contents volume. And connection mark (*) corresponding to text frame E2-2 
means the connection with next page. In other words, Rgure 6A and 6B disclose two pages (PI, P2) only, 
but according to new input document volume, it happens that third page (P3) is formed and the input 
document is located in page PI, P2 and P3. In this case, the last text frame E2-2 in page P2 is connected 
with first text frame in page P3. 

In Rgure 6C. logical property and Its character property (style, font, position or attribute) are disclosed, 
in Rgure 6A and 6B. text frame number links page size with text frame size. In Rgure 68 and 6c, logical 
property links text frame size with character property. 

In Rgure 3 again, format information memory section 5 in the apparatus 10 stores layout structure and 
logical property as shown in Rgure 6A, 68 and 6C. On the other hand, text data analysis section 2 sends 
input text data and its logical property to the format infonmation application section 6. Accordingly, the 
format information application section 8 compares input logical property with the logical property stored in 
format information memory section 5, As for input text data whose logical property coincides with stored 
logical property. Input text data and character property corresponding to coincident stored logical property 
(as shown in Rgure 6C) are sent to the output data generation section 7 in the apparatus 10. The output 
data generation section 7 receives conresponding page size and text frame size (as shown in Rgure 6A and 
6C) from the format information memory section 5. This section 7 forms page area and text frame area in 
the display memory (this is not illustrated). Then the input text data are located In the text frame area 
according to its logical property. And the input text data are arranged according to corresponding character 
property. When a user is displeased with the formatted document from the apparatus 10, he edits it in 
accordance with his desired fonmat In Rgure 3, it is fonmatted document Ma which was edited by user. The 
formatted document Ma is stored instead of M in the fonmatted document memory section 3 in the 
apparatus 10, In short, the latest desired format for the user is stored in the section 3. Next, when new 
document data B is entered, the apparatus 10 applies the formatted document Ma (including the latest 
fonnnat for the user) for new document data B. Therefore, new document data B is arranged according to the 
latest desired format. However, if the user is displeased with new formatted document for the document 
data B, he edits it In Rgure 3, it is formatted document Mb. The formatted document Mb is stored instead 
of Ma in the memory section 3. In this way, the latest document format desired by the user is stored one 
after another. 

Next, by using Rgures 7A, 7B and 8. the matching process between logical properties of the input 
document and logical properties of the stored document will be explained. Rgure 7A shows an example of 
the input document, and Rgure 7B shows the logical property of the input document. The format 
information application section 6 searches for the same logical property in the format information memory 
section 5 as the logical property of the input document. The Logical property of first text data in input 
document is "title". Therefore, the section 6 searches "title" against logical properties in the memory 
section 5 as shown in Rgure 6C. In Rgure 6C, first logical property ''title" is matched and its character 
property (GOTHIC. 18 CENTER) is retrieved. Then, first text data "DOCUMENT FORMATTING SYSTEM" 
and character property (GOTHIC. 18 CENTER) are sent to the output data generation section 7. Next, the 
logical property of the second text data in input document is "author". Therefore, the section 6 searches 
"author" against logical properties in the memory section 5. In Rgure SC. second logical property "author" 
is matched and its character property (GOTHIC 14, CENTER) is retrieved. Then, second text data "ISAMU 
IWAI" and character property (GOTHIC, 14 CENTER) are sent to the output data generation section 7. In 
this way, each input text data and corresponding character property are sent to the section 7. Rgure 8 
shows formatted output of the input document data. According to the logical property in Rgure 6C, the input 
document data is arranged by user's favorite format 



Claims 



EP 0 433 073 A2 



10 



1. A document formatting apparatus, comprising: 

memory means (3) for storing a formatted document; 

extracting means (4) for extracting layout structure information and logical property information for 
the text data In a stored pre-formatted document by use of predetermined extraction rules; 

analysing means (2) for extracting logical property information for the text data in an input 
document; and 

formatting means (6) for determining whether the extracted logical property information from the 
stored formatted document coincides with the logical property information of the input document and 
for editing the text data of the input document in accordance with corresponding layout structure 
infomrtation of the stored formatted document, when the logical property information coincides. 

2. A document formatting apparatus according to claim 1. wherein said layout structure information 
comprises page size, text frame size and character properties. 



IS 3. 



4. 



20 



A document formatting apparatus according to claim 2. wherein said character properties include style 
font and position. 

A document formatting apparatus according to any preceding claim, wherein said formatted document 
has been manually created and edited. 

A document formatting apparatus according to any preceding claim, wherein said text data includes 
one or more of the title, author's name, place, chapter title, chapter paragraph, section title and section 
paragraph. 

A document formatting apparatus according to claim 2, wherein said extracting means is arranged (a) to 
extract page size in the formatted document, (b) to extract text frame size in the page and (c) to extract 
character properties of text data in the text frame, and to then repeat the same sequence for each 
page. 

A document formatting apparatus according to any preceding claim, wherein said formatted document 
includes page size, text frame size, character property and logical property information at predeter- 
mined locations. 

A document formatting apparatus according to any of claims 2 to 7. wherein said formatting means (6) 
places the text data of the input document, whose logical property coincides with the logical property 
included in the text frame, in the corresponding text frame and edits the text data in accordance with 
the character properties conresponding to the coincident logical property included in the text frame. 

9. A document formatting apparatus according to any preceding claim, wherein said memory means (3) 
40 Stores the latest version of the formatted document 
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10. A method of formatting a document comprising the steps of: 

memorising an already formatted document; 

extracting layout structure information and logical property information for the text data in the stored 
formatted document by use of predetermined extraction mles; 

analysing logical property Infonmation for the text data in an input document; 

determining if the extracted logical property information from the stored formatted document 
matches the logical property information of the input document; and 

editing the text data of the input document, in accordance with corresponding layout structure 
information of the stored formatted document when the logical property information matches. 

11. A method of formatting a document according to claim 10. wherein said layout structure information is 
comprised of page size, text frame size and character properties. 

12. A method of formatting a document according to claim ll, wherein said character properties include 
style, font and position. 

13. A method of formatting a document according to any of claims 10 to 12. wherein the said fonnatted 
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@ Document formatting apparatus. 

(S) A document formatting apparatus automatically 
a^^ange input document data so as to match a Pre- 
formatted document. Firstly, layout structure .nclud- 
ng character properties and 'oQica" P-P-'^.es of 
each text data item are extracted from a pre-for- 
matted document. Secondly, logical propert.es o 
each text data item are extracted from an .npu 
document. Thirdly each of the logical properties of 
the input document is compared with corresponding 
logical properties of the pre-formatted document^ 
When the logical properties of input text data are 
matched with logical properties of the Pre-«ormatted 
document, corresponding character properties of the 
pre-formatted document are applied to the input text 
Sata. Therefore, each text data item of the input 
document is automatically arranged in accordance 
with the preset layout structure and corresponding 
character properties. 
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